batch19
QC REPORT
Input files downloaded from:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/mbrave_batch_data/batch19/
Output files are saved to:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/
The consensus network .tsv file exists: TRUE
The fasta file exists: TRUE
The stample statistics file exists: TRUE
The negative control statistics file exists: TRUE
The positive control statistics file exists: TRUE
Total number of positive controls: 96
Number of positive controls per plate: 1
All plates have positive controls: TRUE
Total number of reads in positive controls: 27813
Maximum number of reads: 469 in positive control sample: CONTROL_POS_CAMP_013_G12
Minimum number of reads: 90 in CONTROL_POS_YARN_031_G12
Average number of positive control reads: 289.71875
Median number of positive control reads: 296.5
Read standard deviation: 71.4152723051663
Quantiles:
5%: 134.75
10%: 213
25%: 249.5
50%: 296.5
75%: 329.5
95%: 392.25
100%: 469
Blue solid line: read mean
Orange dotted lines: 5% and 10% lower
quantiles
Number of positive control samples in the lower 5% quantile: 5
CONTROL_POS_FACE_209_G12
CONTROL_POS_FACE_232_G12
CONTROL_POS_PBRI_001_G12
CONTROL_POS_PBRI_004_G12
CONTROL_POS_YARN_031_G12
Names of the associated partners: BIFOR, PBRI, YARN
Total number of negative controls: 500
Total number of lysate negative controls: 417
Total number of empty negative controls: 83
Number of negative controls per plate:
| Number of negative controls per plate | Number of plates |
|---|---|
| 1 | 7 |
| 2 | 76 |
| more than 2 | 13 |
All plates have negative controls: TRUE
Total number of reads in lysate negative controls: 2096
Total number of reads in empty negative controls: 62
Maximum number of reads: 230 in lysate negative control sample: CONTROL_NEG_LYSATE_CAMP_017_H12
Maximum number of reads: 6 in empty negative control sample: CONTROL_NEG_WWTS_005_A8
Zero reads in: 286 negative control samples
In lysate controls: 240
In empty controls: 46
Average number of negative control reads: 4.316
In lysate controls: 5.02637889688249
In empty controls: 0.746987951807229
Median number of negative control reads: 0
In lysate controls: 0
In empty controls: 0
Skewness number of negative control reads: 9.3101349995259
In lysate controls: 8.52792548498113
In empty controls: 2.17703718332309
Quantiles in lysate controls:
5%: 0
10%: 0
25%: 0
50%: 0
75%: 3
95%: 30
98%: 37.68
Quantiles in empty controls:
5%: 0
10%: 0
25%: 0
50%: 0
75%: 1
95%: 2
98%: 3.72
Blue solid line: read mean
Orange dotted lines: upper 5% and 2% of
samples with the highers number of reads
Number of negative control samples in the higher 5%: 26
Out of in the lysate controls: 26
Out of in the empty controls: 0
Number of negative control samples in the higher 2%: 10
Out of in the lysate controls: 10
Out of in the empty controls: 0
Names of the associated partners: CAMP, BIFOR, NENM, PBRI, WWTS
Number of samples in the batch (exclusing controls): 8620
Total number of partner plates: 96
Total number of sample reads: 2813154
Maximum number of sample reads: 803 in sample: FACE_220_D9
Minimum number of sample reads: 0 in 336 samples
which is 3.89791183294664 % of all samples
Average number of reads: 326.351972157773
Median number of reads: 362
Read standard deviation: 152.439735911066
Skewness number of sample reads: -0.660035204264651
Quantiles:
5%: 1
10%: 68.9000000000001
25%: 240
50%: 362
75%: 435
95%: 523
100%: 803
Blue solid line: read mean
Orange dotted lines: lower 5% and 10% of
samples
Number of samples in the lower 10%: 862 out of 8620 samples
Number of samples in the lower 5%: 470 out of 8620 samples
Partners associated with the bottom 5% of samples by read count:
| Partner names | Frequency |
|---|---|
| BIFOR | 170 |
| NENM | 102 |
| CAMP | 61 |
| YARN | 51 |
| WWTS | 42 |
| PBRI | 40 |
| SNST | 4 |
Number of samples with 0 reads: 336
Plates where the 75th percentile of the data is lower than expected mean read count (dark grey):
CAMP_039
FACE_209
FACE_227
PBRI_005
SNST_005
SNST_007
YARN_026
YARN_031
which constitutes 8.33333333333333 % of all partner plates in this batch
Grey line: median
Brown line: mean
Green data points:
positive controls
Blue data points: empty negative controls
Navy data points: lysate negative controls
Plates where the 75th percentile of the data is lower than expected mean read count (dark grey):
How many samples from the low-performance partner plates are present in the low-performance UMI plates (purple data points): 0 %
Assess the positive controls with the low number of reads detected in the previous steps:
FACE_209 More reads in positive control than in samples on average.
Observed number of reads: 113 Expected: 105.440860215054
FACE_232 Positive control failed.
Observed number of reads: 133 Expected: 225.161290322581
PBRI_001 Positive control failed.
Observed number of reads: 131 Expected: 308.591397849462
PBRI_004 Positive control failed.
Observed number of reads: 134 Expected: 247.430107526882
YARN_031 Positive control failed.
Observed number of reads: 90 Expected: 214.537634408602
YARN_031
FACE_209
The above plates have lower than expected number of reads
AND failed positive controls.
THESE PLATES NEED TO BE EXAMINED FURTHER
Low-quality plates are displayed here. All the other plates are
plotted in the last part of this report.
Green squares:
controls [any kind]
### Assessment of sequence conflicts and contaminants
Positive control as contamination source
NOTE: All sample and sequence IDs match - data successfully merged
Positive control OTU is TAX:1287025
Non-positve control samples that contain positive control reads:
| Sample | Control Sequence Count | Sequence Similarity | Sequence Type | UMI Plate ID |
|---|---|---|---|---|
| FACE_240_B3 | 1 | 99.23195 | secondary | 8 |
Number of samples with positive control OTU as primary sequence: NA
Number of samples with positive control OTU as secondary sequence: 1
out of 4842 samples with secondary sequences
Location of the contaminants relative to the source:
Orange square: positive contros
Green squares: samples with positive
control contamination
Read count mean of all secondary sequences in all samples: 5.10170152601043
Read count mean of all positive control sequences in other samples: 1
Read count median of all secondary sequences in all samples: 1
Read count median of all positive control sequences in other samples: 1
Blue solid line: secondary hit read mean
Orange dotted lines:
mean of reads found as secondary contaminants from the positive controls
in other samples
Both lines should be in close proximity meaning
that the secondary contamination from positive controls is comparable to
the potential contamination in other samples.
There are no samples that contain positive control reads as the primary hit
NO SAMPLES TO BE REMOVED
NOTE: the above samples are automatically removed if:
Negative control contamination
Distribution
of reads in negative controls
NOTE: contamination source can be either primary or secondary sequence within samples!
| Family | No. Source Samples |
|---|---|
| Chironomidae | 64 |
| Entomobryidae | 34 |
| Hominidae | 7 |
| Sciaridae | 7 |
| Agromyzidae | 4 |
| Coccinellidae | 3 |
| None | 2 |
| Psychodidae | 2 |
Outline: negative controls with contaminants
Colour of the
oultine indicates partners to track the samples between partner and UMI
plates.
Thicker chartreuse outline: FAILED negative controls with
contaminants [2%]
Numbers indicate the read count
Squares that
are not outlined represent potential sources of contamination within
plates: identical sequences found within these wells and negative
controls.
NOTE: Controls are not included!
Number of wells with a primary sequence only: 3430
Number of wells with primary and secondary sequences: 4793
Number of primary chimeric sequences: 28
Number of secondary chimeric sequences: 5496
NOTE: All secondary chimeric sequences successfully removed
[1] 2
Number of samples with only primary chimeric sequence recognised: 12
We do not know how mBRAVE recognises chimeras - for now ony samples represented by less than 5 reads get removed
Retained samples: 2
Number of EXCLUDED primary sequences: 469
which constitutes 5.71045902836966 % of all samples
These samples are not being removed - it's an mBRAVE cut-off
Number of primary sequences with no taxonomy assigned: 110
which constitutes 1.33934007061975 % of all samples
These samples are going to be examined further
Number of samples with no taxonomy assigned that will be replaced with the secondary sequence based on the sequence similarity: 19
Other sequences with no taxonomy assigned to the primary sequence will remain unchanged.
If the first entry is not ‘Arthropod’, then the second entry is likely correct [based on manual observations]
Number of samples with Wolbachia detected: 421
Table with plate positions, number of reads, and sequences saved to the output directory:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/
Number of samples with Nematoda, Tardigrada, Annelida, and/or Rotifera detected: 71
Table with plate positions, number of reads, and sequences saved to the output directory:
/nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/
| Taxon | Frequency |
|---|---|
| Chordata | 38 |
| Nematoda | 2 |
| Proteobacteria | 64 |
| Rotifera | 1 |
105 wells had primary non-Arthropod hits and secondary Arthropod hits
NOTE: Primary hits are going to be replaced
Samples with only non-Arthropod sequences detected: 56
NOTE: These samples have been excluded!
Total number of wells with Anopheles reads: 82
| Plate | Sequence | No. samples with Anopheles reads |
|---|---|---|
| CAMP_014 | secondary | 1 |
| CAMP_027 | primary | 8 |
| CAMP_027 | secondary | 1 |
| CAMP_038 | primary | 1 |
| CAMP_038 | secondary | 2 |
| CAMP_039 | primary | 5 |
| CAMP_039 | secondary | 6 |
| CAMP_040 | primary | 4 |
| CAMP_040 | secondary | 2 |
| CAMP_041 | primary | 4 |
| CAMP_041 | secondary | 4 |
| CAMP_042 | primary | 10 |
| CAMP_043 | primary | 15 |
| CAMP_043 | secondary | 3 |
| CAMP_044 | primary | 4 |
| CAMP_044 | secondary | 2 |
| CAMP_045 | primary | 1 |
| CAMP_046 | primary | 3 |
| CAMP_046 | secondary | 1 |
| SNST_007 | primary | 3 |
| SNST_007 | secondary | 1 |
| WWTS_003 | secondary | 1 |
| Partner | Plate | No. Anopheles primary sequences with 200 + reads |
|---|---|---|
| CAMP | 043 | 15 |
| CAMP | 042 | 10 |
| CAMP | 027 | 8 |
| CAMP | 040 | 4 |
| CAMP | 039 | 3 |
| CAMP | 044 | 3 |
| CAMP | 046 | 3 |
| CAMP | 041 | 2 |
| CAMP | 038 | 1 |
| CAMP | 045 | 1 |
Number of primary African Anopheline hits [200 or more reads]: 50
NOTE: All primary mosquito samples removed!
Number of samples with only primary Arthropod sequence: 6290
79.309040474089 % of all remaining samples
Number of samples where secondary sequence is not present elsewhere on the partner or UMI plate: 0
Number of conflicting sequences [sequences are in different families or orders, both have good read support]: 130
| Primary hit | Number |
|---|---|
| Arthropoda | 7830 |
| None | 84 |
Number of retained samples: 7914
Number of Arthropod samples assigned by mBRAVE [this inscludes samples with fewer than 5 reads that have now been excluded!]: 7983
Number of samples with replaced sequences: 37
Retained chimeras: 11
Retained samples with no taxonomy: 84
Each retreived sample has only one sequence: TRUE
| Number of samples | Description | Category | Decision |
|---|---|---|---|
| 5262 | Only one sequence with more than 200 reads, no secondary sequence detected | 1 | YES |
| 814 | Only one sequence with 50 to 200 reads, no secondary sequence detected | 2 | YES |
| 202 | Only one sequence with 5 or more but less than 50 reads, no secondary sequence detected | 3 | YES |
| 120 | Dominant sequence with more than 200 reads, non-conflicting secondary sequences with 5 or less reads | 4 | YES |
| 34 | Dominant sequence with 50 to 200 reads, non-conflicting secondary sequences with 5 or less reads | 5 | YES |
| 759 | Dominant sequence with more than 200 reads, conflicting secondary sequences with 5 or less reads | 6 | YES |
| 316 | Dominant sequence with 50 to 200 reads, conflicting secondary sequences with 5 or less read | 7 | YES |
| 170 | Dominant sequence with more than 200 reads, secondary sequences with more than 5 read support | 8 | NO |
| 140 | Dominant sequence with 50 to 200 reads, secondary sequences with more than 5 read support | 9 | NO |
| 60 | Dominant sequence with 5 or more but less than 50 reads, non-conflicting secondary sequences with less than 5 reads | 10 | NO |
| 37 | Dominant sequence with more than 5 but less than 50 reads, any other secondary reads present | 11 | NO |
| Decision category | Number of samples |
|---|---|
| NO | 407 |
| YES | 7507 |
8.19025522041764 % OF SAMPLES EXCLUDED [all samples]
12.9118329466357 % OF SAMPLES EXCLUDED [only approved samples]
NOTE: The heatmaps below show only the retained samples.
Controls, chimeric samples, non-Arthropod samples, and samples with no
taxonomy assigned have been removed or replaced!
Final fasta file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/BOLD_filtered_sequences_batch19.fasta
Final metadata file succesfully saved: /nfs/users/nfs_a/aw43/aw43/2024_07_bioscan_qc/input/output/qc_reports/batch19/BOLDfiltered_metadata_batch19.csv
The report and output files have been successfully
generated!
| Plate | Original number of samples | No. samples post-QC | No. confident samples | Percentage of confident samples |
|---|---|---|---|---|
| PBRI_010 | 30 | 30 | 30 | 100.00000 |
| NENM_036 | 93 | 92 | 92 | 98.92473 |
| NENM_038 | 93 | 92 | 92 | 98.92473 |
| YARN_026 | 93 | 92 | 92 | 98.92473 |
| NENM_035 | 93 | 91 | 91 | 97.84946 |
| YARN_024 | 93 | 91 | 91 | 97.84946 |
| YARN_028 | 93 | 91 | 91 | 97.84946 |
| FACE_238 | 93 | 90 | 90 | 96.77419 |
| PBRI_002 | 93 | 91 | 90 | 96.77419 |
| PBRI_008 | 93 | 92 | 90 | 96.77419 |
| SHAP_026 | 93 | 92 | 90 | 96.77419 |
| YARN_027 | 93 | 92 | 90 | 96.77419 |
| FACE_239 | 93 | 91 | 89 | 95.69892 |
| SHAP_025 | 93 | 91 | 89 | 95.69892 |
| SNST_004 | 93 | 93 | 89 | 95.69892 |
| YARN_025 | 93 | 89 | 89 | 95.69892 |
| YARN_029 | 92 | 89 | 88 | 95.65217 |
| SNST_001 | 89 | 87 | 85 | 95.50562 |
| SNST_003 | 94 | 92 | 89 | 94.68085 |
| FACE_229 | 93 | 89 | 88 | 94.62366 |
| PBRI_007 | 93 | 89 | 88 | 94.62366 |
| SNST_002 | 92 | 89 | 87 | 94.56522 |
| SNST_005 | 92 | 88 | 87 | 94.56522 |
| CAMP_013 | 94 | 94 | 88 | 93.61702 |
| FACE_240 | 93 | 88 | 87 | 93.54839 |
| FACE_241 | 93 | 90 | 87 | 93.54839 |
| NENM_029 | 93 | 87 | 87 | 93.54839 |
| NENM_031 | 93 | 87 | 87 | 93.54839 |
| NENM_033 | 93 | 87 | 87 | 93.54839 |
| NENM_037 | 93 | 88 | 87 | 93.54839 |
| SNST_006 | 93 | 92 | 87 | 93.54839 |
| CAMP_048 | 93 | 90 | 86 | 92.47312 |
| FACE_237 | 93 | 90 | 86 | 92.47312 |
| NENM_028 | 93 | 86 | 86 | 92.47312 |
| NENM_030 | 93 | 86 | 86 | 92.47312 |
| NENM_032 | 93 | 86 | 86 | 92.47312 |
| WWTS_009 | 93 | 89 | 86 | 92.47312 |
| CAMP_038 | 93 | 88 | 85 | 91.39785 |
| FACE_228 | 93 | 87 | 85 | 91.39785 |
| FACE_235 | 93 | 85 | 85 | 91.39785 |
| PBRI_006 | 93 | 86 | 85 | 91.39785 |
| CAMP_047 | 93 | 91 | 84 | 90.32258 |
| FACE_219 | 93 | 88 | 84 | 90.32258 |
| NENM_027 | 93 | 84 | 84 | 90.32258 |
| WWTS_004 | 93 | 90 | 84 | 90.32258 |
| CAMP_014 | 94 | 87 | 84 | 89.36170 |
| CAMP_017 | 94 | 92 | 84 | 89.36170 |
| CAMP_042 | 93 | 83 | 83 | 89.24731 |
| SNST_007 | 34 | 30 | 30 | 88.23529 |
| FACE_231 | 93 | 88 | 82 | 88.17204 |
| FACE_234 | 93 | 85 | 82 | 88.17204 |
| NENM_039 | 93 | 82 | 82 | 88.17204 |
| WWTS_005 | 93 | 91 | 82 | 88.17204 |
| CAMP_026 | 74 | 66 | 65 | 87.83784 |
| CAMP_045 | 32 | 29 | 28 | 87.50000 |
| FACE_233 | 93 | 86 | 81 | 87.09677 |
| FACE_236 | 93 | 88 | 81 | 87.09677 |
| CAMP_027 | 74 | 65 | 64 | 86.48649 |
| FACE_222 | 93 | 88 | 80 | 86.02151 |
| PBRI_001 | 93 | 85 | 80 | 86.02151 |
| PBRI_004 | 93 | 82 | 80 | 86.02151 |
| CAMP_015 | 94 | 91 | 80 | 85.10638 |
| FACE_225 | 93 | 83 | 79 | 84.94624 |
| NENM_034 | 93 | 83 | 79 | 84.94624 |
| WWTS_008 | 93 | 84 | 79 | 84.94624 |
| PBRI_005 | 93 | 83 | 78 | 83.87097 |
| FACE_220 | 91 | 80 | 76 | 83.51648 |
| CAMP_018 | 94 | 89 | 78 | 82.97872 |
| CAMP_019 | 47 | 46 | 39 | 82.97872 |
| CAMP_044 | 93 | 83 | 77 | 82.79570 |
| FACE_221 | 93 | 83 | 77 | 82.79570 |
| YARN_032 | 93 | 78 | 77 | 82.79570 |
| CAMP_041 | 93 | 84 | 76 | 81.72043 |
| WWTS_006 | 93 | 87 | 76 | 81.72043 |
| WWTS_010 | 83 | 74 | 67 | 80.72289 |
| FACE_209 | 93 | 84 | 75 | 80.64516 |
| FACE_232 | 93 | 82 | 75 | 80.64516 |
| PBRI_009 | 93 | 91 | 75 | 80.64516 |
| WWTS_001 | 93 | 90 | 75 | 80.64516 |
| WWTS_007 | 93 | 83 | 75 | 80.64516 |
| YARN_030 | 93 | 81 | 74 | 79.56989 |
| WWTS_003 | 93 | 89 | 73 | 78.49462 |
| CAMP_040 | 93 | 77 | 72 | 77.41935 |
| FACE_223 | 93 | 83 | 72 | 77.41935 |
| YARN_031 | 93 | 74 | 72 | 77.41935 |
| CAMP_016 | 94 | 87 | 72 | 76.59574 |
| CAMP_043 | 93 | 74 | 71 | 76.34409 |
| CAMP_039 | 93 | 81 | 70 | 75.26882 |
| FACE_230 | 93 | 70 | 69 | 74.19355 |
| CAMP_046 | 93 | 72 | 68 | 73.11828 |
| FACE_224 | 93 | 80 | 68 | 73.11828 |
| PBRI_003 | 93 | 76 | 68 | 73.11828 |
| FACE_226 | 93 | 66 | 62 | 66.66667 |
| WWTS_002 | 93 | 82 | 58 | 62.36559 |
| FACE_227 | 93 | 55 | 51 | 54.83871 |
| NENM_040 | 64 | 20 | 20 | 31.25000 |
| Partner | Original number of samples | No. samples post-QC | No. confident samples | Percentage of confident samples |
|---|---|---|---|---|
| SHAP | 186 | 183 | 179 | 96.23656 |
| SNST | 587 | 571 | 554 | 94.37819 |
| YARN | 836 | 777 | 764 | 91.38756 |
| NENM | 1273 | 1151 | 1146 | 90.02357 |
| PBRI | 867 | 805 | 764 | 88.11995 |
| FACE | 2230 | 1999 | 1891 | 84.79821 |
| CAMP | 1721 | 1569 | 1454 | 84.48576 |
| WWTS | 920 | 859 | 755 | 82.06522 |
| Plate | Original number of samples | No. samples post-QC | No. confident samples | Percentage of confident samples |
|---|---|---|---|---|
| 22 | 371 | 364 | 361 | 97.30458 |
| 10 | 372 | 354 | 353 | 94.89247 |
| 8 | 372 | 359 | 353 | 94.89247 |
| 16 | 312 | 303 | 293 | 93.91026 |
| 24 | 372 | 364 | 349 | 93.81720 |
| 19 | 372 | 343 | 343 | 92.20430 |
| 21 | 362 | 343 | 333 | 91.98895 |
| 9 | 311 | 290 | 285 | 91.63987 |
| 15 | 368 | 355 | 337 | 91.57609 |
| 7 | 372 | 348 | 334 | 89.78495 |
| 12 | 372 | 340 | 331 | 88.97849 |
| 20 | 372 | 340 | 327 | 87.90323 |
| 13 | 309 | 303 | 270 | 87.37864 |
| 17 | 376 | 359 | 324 | 86.17021 |
| 18 | 309 | 292 | 265 | 85.76052 |
| 3 | 370 | 335 | 312 | 84.32432 |
| 2 | 353 | 306 | 296 | 83.85269 |
| 6 | 372 | 326 | 307 | 82.52688 |
| 1 | 372 | 330 | 303 | 81.45161 |
| 4 | 372 | 334 | 299 | 80.37634 |
| 14 | 372 | 352 | 297 | 79.83871 |
| 23 | 372 | 305 | 291 | 78.22581 |
| 5 | 372 | 297 | 286 | 76.88172 |
| 11 | 343 | 272 | 258 | 75.21866 |
The plates with low number of reads and retained samples should be examined!
Failed negative controls [2%] with contamination other than Bovidae:
CONTROL_NEG_LYSATE_CAMP_016_H12
CONTROL_NEG_LYSATE_CAMP_017_H12
CONTROL_NEG_LYSATE_FACE_222_H12
These samples may have insects in them!